Learning Objectives

After completing this lesson, you’ll be able to:

Instructions

In this lesson, you will:

Resources

View Your Data at Any Point in the Workspace

Jennifer has edited her data’s schema using an AttributeManager. She knows the data changed because she can see the green attribute ports showing successful schema mapping. However, Jennifer wants to see the changes to her data in Data Preview.

Jennifer can do this using data caching. Data caching is an authoring mode that is enabled by default. When enabled, a local data cache is stored at every output port in the workspace. These caches let you view and compare data anywhere in your workspace.

Data caching is useful when you are authoring a workspace. It lets you use iterative and incremental development to add one transformer or feature type at a time, create a cache, and inspect it to confirm the data looks as you expect. It is particularly advantageous when working with web, database, or compressed data. It allows you to download, query, or extract data once and use a cache, saving time and effort when reading large datasets or making API calls. However, creating these caches takes time, so it’s wise to disable this mode when you want FME to run at peak efficiency.

We'll leave data caching on for now since it can speed up authoring. You can toggle data caching on and off using the button on the toolbar.

Partial Runs

With data caching enabled, you have access to another FME Workbench feature: partial runs.

Partial runs let you run specified sections of your workspace instead of the entire workspace. This feature works in tandem with data caching to enable incremental development. When you add a new transformer or feature type, you can run that new object independently and inspect its cache for any problems. Authoring workspaces this way saves time by allowing you to detect and fix problems early.

Partial runs will utilize any connected caches “upstream” (i.e., earlier in the data flow). As few transformers or feature types will run as required to carry out the partial run, saving you time.

If any changes have been made to your workspace, any “downstream” (i.e., later in the data flow) caches turn yellow, indicating they are invalid. These are called invalid caches and can be rerun to make them valid again. For example, if you changed the AttributeManager below, its cache would become invalid, but the upstream reader feature type cache would remain valid:

Invalid cache

When you click the Run button, FME will run any object if it doesn't have a cache, has an invalid cache, or is a writer feature type. Continuing the example, if you hover over Run after making a change to only the AttributeManager:

Green highlight

FME will start from the cache in the Source Data bookmark, run the AttributeManager, and write the data out. The areas that will run are highlighted in green.

Other options for partial runs exist depending on the object’s location on the canvas:

  • Run From This (or Selected if multiple objects are selected): All parts of the workspace downstream of a selected object (or objects) run.
  • Run Just This (or Selected): Only a selected object (or objects) runs.
  • Run To This (or Selected): All workspace parts upstream of a selected object (or objects) run.
  • Run Between Selected: All workspace parts between selected objects run.

This technique becomes very important as you build bigger workspaces. Use the green highlight to determine which parts of the workspace will run. Always run the smallest section of your workspace as possible for your goals. 

Scenario

Jennifer

Jennifer will partial runs to speed up workspace authoring and debug them as she works.

1) Open FME Workbench

2) Use Run To This

We want to run our workspace to ensure the AttributeManager worked properly. We could just click Run > Rerun Entire Workspace. However, as workspaces grow, running only the changed section is better. 

Run To This on the AttributeManager

AttributeManager feature cache

Table View

3) Write the Data

Now that we've confirmed the schema is correct, we want to write the data.

Partial runs

Leave Us Feedback on This Lesson